Goto

Collaborating Authors

 virtual memory


Black Mirror is now a delightful escape from reality

Engadget

The latest season of Black Mirror feels almost therapeutic as we peer over the cliff of civilizational collapse. Everything is awful, but at least we don't have to worry about renting out access to our brains from skeevy startups, or dealing with the consequences of a PC game's super-intelligent AI. While Black Mirror felt like a horrifying harbinger of an over-teched future when it debuted in 2011, now it's practically an escape from the fresh hell of real world headlines. That's not to say that the show has lost any of the acerbic bite from creator Charlie Brooker. But now Brooker and his writers -- Ms. Marvel showrunner Bisha K. Ali, William Bridges, Ella Road and Bekka Bowling -- more deftly wield their talent for cultural analysis. Not all of the new episodes revolve around nefarious new tech, sometimes the tools themselves are genuinely helpful -- it's humans who are often the real problem.


vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

arXiv.org Artificial Intelligence

Efficient management of GPU memory is essential for high throughput LLM inference. Prior systems used to reserve KV-cache memory ahead-of-time that resulted in wasted capacity due to internal fragmentation. Inspired by demand paging, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation and improves serving throughout. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. As a consequence, one needs to rewrite the attention kernels to support paging, and implement a memory manager in the serving framework. This results in both performance and programming overheads, as well as portability challenges in adopting state-of-the-art attention kernels. In this paper, we propose vAttention, a new approach for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention stores KV-cache in contiguous virtual memory and leverages OS support for on-demand allocation of physical memory. vAttention thus enables one to use state-of-the art attention kernels out-of-the-box by adding support for dynamic allocation of physical memory without having to re-write their code. We implement vAttention in the vLLM serving stack to show that it also helps improve decode throughput by up to 1.99x over vLLM, and the end-to-end serving throughput by up to 1.22x and 1.29x, compared to using the state-of-the-art PagedAttention based kernels of FlashAttention and FlashInfer.


HitPaw Video Enhancer V1.3.0 Release: Enhancing Video Smoothly with Virtual Memory

#artificialintelligence

The latest version of HitPaw Video Enhancer will support automatically enabling virtual memory, in order to make the process of enhancing videos smoother. And once the program accidentally crashes, you can continue the process directly after restarting the program. It is also worth mentioning that after this update, you can save the operation record when you close the program. In addition to the above, the latest version also fixes some known issues. So this latest version will definitely amaze you.


How to boost PyTorch Dataset using memory-mapped files

#artificialintelligence

We call a memory-mapped file, a file that has its contents directly assigned to a segment of virtual memory, this way we can perform any operations on that segment just like on any other portion of main memory we have access to in the current process. Due to the additional abstraction layer represented by the virtual memory, we can map into memory files that are much larger than the physical capacity of our machine. The segments of memory (called pages) that are required by the running process are fetched from external storage and copied into the main memory automatically by the virtual memory manager.


Locality and Professional Life

Communications of the ACM

One Sunday morning nearly three decades ago, my wife Dorothy and I were walking along the Potomac River in Washington, D.C. I was considering a job change and was concerned about whether my new responsibilities would divert me from my aspiration that my work "make a mark." She asked what I meant by making a mark. That meant, I confided, that people would long remember my contribution by name. She said that if that is my philosophy of life, I am likely to be disappointed.


Steadily Learn to Drive with Virtual Memory

arXiv.org Artificial Intelligence

Reinforcement learning has shown great potential in developing high-level autonomous driving. However, for high-dimensional tasks, current RL methods suffer from low data efficiency and oscillation in the training process. This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems. LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent's experience. Various imagined latent trajectories are generated as virtual memory by the latent dynamic model. The policy is learned by propagating gradient through the learned latent model with the imagined latent trajectories and thus leads to high data efficiency. Furthermore, a double critic structure is designed to reduce the oscillation during the training process. The effectiveness of LVM is demonstrated by an image-input autonomous driving task, in which LVM outperforms the existing method in terms of data efficiency, learning stability, and control performance.


A Solution to the Memory Limit Challenge in Big Data Machine Learning

#artificialintelligence

The model training process in big data machine learning is both computation- and memory-intensive. Many parallel machine learning algorithms consist of iterating a computation over a training dataset and updating the related model parameters until the model converges. In the Big Data era, both the volume of a dataset and the number of model parameters can be huge. To accelerate the performance of the iterative computation, it's common to cache the training data and model parameters into memory. However, due to the limitations of memory, in many scenarios, it might not all fit.


The Dream of a Lifetime

AITopics Original Links

You've likely heard stories about the birth of the PC: of Xerox PARC as the Mecca of computing; of its creation of the Alto, Ethernet, and the laser printer; of the Homebrew Computer Club, the MITS Altair, Bill Gates and the theft of his Micro-soft Basic; of Steve Jobs and Stephen Wozniak, the founding of Apple, and the Jobs visit to PARC that inspired the Macintosh. But what you may not know about is the really early history. The stories of Doug Engelbart and John McCarthy, of the Augmentation Research Center, and of the early days of the Stanford University AI Lab (SAIL) are not well known. Yes, you may have heard that Engelbart invented the mouse, and that SAIL and Stanford led to companies like Sun and Cisco. But there are better stories, great and old ones from the early days of computing, about the events that led to personal computing as we know it. In his wonderful new book, What the Dormouse Said…, John Markoff tells these stories.